server: rm gossip usage from node/store/cluster ID allocation #56263

irfansharif · 2020-11-03T21:01:16Z

In 20.2, with #52526, we introduced the join RPC for new nodes in the
system to be handed a pre-allocated node ID, a pre-allocated store ID,
and be informed by the existing cluster what the cluster ID was.
In 20.1 and earlier this functionality was provided by our use of
gossip, and by constructing hollowed out servers (without node/store
IDs) and allocating an ID at the joining node. This required an
awkward dance around needing KV to be up in order to allocate the right
IDs, but not having an ID for the server process itself.

We retained the deprecated gossip codepaths in 20.2 in order to maintain
compatibility with 20.1. Now that 20.2 is cut however, in 21.1 code we
can always assume that we're talking to nodes that are at least 20.2,
and therefore nodes that are able to use the join RPC correctly. This
lets us strip out our usage of gossip for ID allocation, which in turn
paves the way for a few other simplifications.

We also delete the mixed-version/join-init roachtest, as it was only
ever relevant for the 20.1/20.2 cycle. The current structure around
joining nodes being allocated/informed of the right IDs has adequate
coverage with TestClusterConnectivity.

Release note: None

cockroach-teamcity · 2020-11-03T21:01:23Z

This change is

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)

pkg/server/node.go, line 561 at r1 (raw file):

		// first store ID has already been pre-allocated for us.
		//
		// TODO(irfansharif): Is this sound? If we're restarting an already

Sigh, this is unsound on master, #56272. Good thing we didn't officially support multiple stores in 20.2.

#56272 and these TODOs are unrelated to this PR though, I just noticed given I was in the area. I'll work on cleaning them up separately.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)

pkg/server/init.go, line 181 at r1 (raw file):

// on persisted data instead of this flag.
//
// [1]: This is not technically true for mixed version clusters where we leave

Hmm, isn't this gone now, too?

pkg/server/init.go, line 375 at r1 (raw file):

		if errors.Is(err, ErrIncompatibleBinaryVersion) {
			// Propagate upwards; these is an error condition the caller knows

nit: this is

pkg/server/node.go, line 389 at r1 (raw file):

	// Inform the RPC context of the node ID.
	n.storeCfg.RPCContext.NodeID.Set(ctx, nodeID)

This is done elsewhere already?

pkg/server/node.go, line 561 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Sigh, this is unsound on master, #56272. Good thing we didn't officially support multiple stores in 20.2.

#56272 and these TODOs are unrelated to this PR though, I just noticed given I was in the area. I'll work on cleaning them up separately.

I mentioned this in #56272, but I'm looking again and still don't see the problem. storeIDAlloc is the number of storeIDs we still need. We pass that to allocateStoreIDs which gets us that many IDs from the KV layer. startID is the first ID we got back.

Oh, is the problem that the Join RPC also allocated a NodeID/StoreID for us but that we're not properly persisting it? I just briefly looked at the code and was expecting the initServer to actually bootstrap the store with the NodeID/ClusterID/StoreID but now I'm not sure where that happens. Oh, it's in this code via firstStoreID? OK, that is all surprising and now I can see why you're concerned. This does need to be cleaned up. I would generally prefer if the auxiliary and first store were treated separately, just to get the code clean. In particular, it looks like we're bootstrapping the first store asynchronously?!

pkg/server/server.go, line 1396 at r1 (raw file):

	s.rpcContext.ClusterID.Set(ctx, state.clusterID)
	s.rpcContext.NodeID.Set(ctx, state.nodeID)

ah, here it is.

tbg

Reviewed 6 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)

pkg/server/node.go, line 561 at r1 (raw file):
Like I mentioned in #56271 (comment), I think our saving grace here around the number of store IDs we allocated was accidental.

In particular, it looks like we're bootstrapping the first store asynchronously?!

Yup, yikes. Here's what I think the new code structure should be.

- In the init code:
  - If we're being bootstrapped:
    - Only initialize first store, leave remaining stores to start up code
      later
  - If we're joining an existing cluster:
    - Only initialize first store, leave remaining stores to start up code
      later

- Later, bootstrapping additional new stores:
  - Allocate len(auxiliary engines) store IDs, and bootstrap them
    asynchronously. 

This should let us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the first auxiliary store ID should
be.

Cases to consider:
- [x] Bootstrapping first node with single engine
- [x] Bootstrapping first node with multiple engines
- [x] Joining a bootstrapped cluster, with single engine
- [x] Joining a bootstrapped cluster, with multiple engines
- [x] Restarting a bootstrapped node with no new engines
- [x] Restarting a bootstrapped node with new engines

What do you think? Ignoring the gossip code paths here because I want to remove them anyway.

...when joining an existing cluster. This diff adds some sanity around how we bootstrap stores for nodes when we're informed by the existing cluster what the first store ID should be. Previously we were bootstrapping the first store asynchronously, and that is not what we want. We first observed the implications of doing so in cockroachdb#56263, which attempted to remove the use of gossip in cluster/node ID distribution. There we noticed that our somewhat haphazard structure around initialization of stores could lead to doubly allocating store IDs (cockroachdb#56272). We were inadvertently safeguarded against this, as described in cockroachdb#56271, but this structure is still pretty confusing and needed cleanup. Now the new store initialization structure is the following: ``` - In the init code: - If we're being bootstrapped: - Initialize all the stores - If we're joining an existing cluster: - Only initialize first store, leave remaining stores to start up code later - Later, when initializing additional new stores: - Allocate len(auxiliary engines) store IDs, and initialize them asynchronously. ``` This lets us avoid threading in the first store ID, and always rely on the KV increment operation to tell us what the store ID should be for the first additional store. We update TestAddNewStoresToExistingNodes to test allocation behaviour with more than just two stores. Eventually we could simplify the init code to only initialize the first store when we're bootstrapping (there's a longstanding TODO from Andrei to that effect), but it's not strictly needed. This PR unblocks cockroachdb#56263. Release note: None

56299: server: initialize first store within the init server r=irfansharif a=irfansharif ...when joining an existing cluster. This diff adds some sanity around how we bootstrap stores for nodes when we're informed by the existing cluster what the first store ID should be. Previously we were bootstrapping the first store asynchronously, and that is not what we want. We first observed the implications of doing so in #56263, which attempted to remove the use of gossip in cluster/node ID distribution. There we noticed that our somewhat haphazard structure around initialization of stores could lead to doubly allocating store IDs (#56272). We were inadvertently safeguarded against this, as described in #56271, but this structure is still pretty confusing and needed cleanup. Now the new store initialization structure is the following: ``` - In the init code: - If we're being bootstrapped: - Initialize all the stores - If we're joining an existing cluster: - Only initialize first store, leave remaining stores to start up code later - Later, when initializing additional new stores: - Allocate len(auxiliary engines) store IDs, and initialize them asynchronously. ``` This lets us avoid threading in the first store ID, and always rely on the KV increment operation to tell us what the store ID should be for the first additional store. We update TestAddNewStoresToExistingNodes to test allocation behaviour with more than just two stores. Eventually we could simplify the init code to only initialize the first store when we're bootstrapping (there's a longstanding TODO from Andrei to that effect), but it's not strictly needed. This PR unblocks #56263. Release note: None Co-authored-by: irfan sharif <[email protected]>

...when joining an existing cluster. This diff adds some sanity around how we bootstrap stores for nodes when we're informed by the existing cluster what the first store ID should be. Previously we were bootstrapping the first store asynchronously, and that is not what we want. We first observed the implications of doing so in cockroachdb#56263, which attempted to remove the use of gossip in cluster/node ID distribution. There we noticed that our somewhat haphazard structure around initialization of stores could lead to doubly allocating store IDs (cockroachdb#56272). We were inadvertently safeguarded against this, as described in cockroachdb#56271, but this structure is still pretty confusing and needed cleanup. Now the new store initialization structure is the following: ``` - In the init code: - If we're being bootstrapped: - Initialize all the stores - If we're joining an existing cluster: - Only initialize first store, leave remaining stores to start up code later - Later, when initializing additional new stores: - Allocate len(auxiliary engines) store IDs, and initialize them asynchronously. ``` This lets us avoid threading in the first store ID, and always rely on the KV increment operation to tell us what the store ID should be for the first additional store. We update TestAddNewStoresToExistingNodes to test allocation behaviour with more than just two stores. Eventually we could simplify the init code to only initialize the first store when we're bootstrapping (there's a longstanding TODO from Andrei to that effect), but it's not strictly needed. This PR unblocks cockroachdb#56263. Release note: None

irfansharif

PTAL, rebased atop #56299 and #56302 (first commit here, to be ignored).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)

pkg/server/init.go, line 181 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hmm, isn't this gone now, too?

Done.

pkg/server/node.go, line 561 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Like I mentioned in #56271 (comment), I think our saving grace here around the number of store IDs we allocated was accidental.

In particular, it looks like we're bootstrapping the first store asynchronously?!

Yup, yikes. Here's what I think the new code structure should be.

- In the init code:
  - If we're being bootstrapped:
    - Only initialize first store, leave remaining stores to start up code
      later
  - If we're joining an existing cluster:
    - Only initialize first store, leave remaining stores to start up code
      later

- Later, bootstrapping additional new stores:
  - Allocate len(auxiliary engines) store IDs, and bootstrap them
    asynchronously. 

This should let us avoid threading in the first store ID, and always rely on
the KV increment operation to tell us what the first auxiliary store ID should
be.

Cases to consider:
- [x] Bootstrapping first node with single engine
- [x] Bootstrapping first node with multiple engines
- [x] Joining a bootstrapped cluster, with single engine
- [x] Joining a bootstrapped cluster, with multiple engines
- [x] Restarting a bootstrapped node with no new engines
- [x] Restarting a bootstrapped node with new engines

What do you think? Ignoring the gossip code paths here because I want to remove them anyway.

Cleaned up across #56302 and #56299.

tbg

so satisfying to see this cleaned up.

Reviewed 2 of 8 files at r2, 6 of 6 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)

irfansharif · 2020-11-06T13:57:54Z

TFTR!

bors r+

craig · 2020-11-06T14:44:07Z

Build failed:

GitHub CI (Cockroach)

In 20.2, with cockroachdb#52526, we introduced the join RPC for new nodes in the system to be handed a pre-allocated node ID, a pre-allocated store ID, and be informed by the existing cluster what the cluster ID was. In 20.1 and earlier this functionality was provided by our use of gossip, and by constructing hollowed out servers (without node/store IDs) and allocating an ID at the joining node. This required an awkward dance around needing KV to be up in order to allocate the right IDs, but not having an ID for the server process itself. We retained the deprecated gossip codepaths in 20.2 in order to maintain compatibility with 20.1. Now that 20.2 is cut however, in 21.1 code we can always assume that we're talking to nodes that are at least 20.2, and therefore nodes that are able to use the join RPC correctly. This lets us strip out our usage of gossip for ID allocation, which in turn paves the way for a few other simplifications. We also delete the mixed-version/join-init roachtest, as it was only ever relevant for the 20.1/20.2 cycle. The current structure around joining nodes being allocated/informed of the right IDs has adequate coverage with TestClusterConnectivity. Release note: None

irfansharif · 2020-11-09T16:44:45Z

I'm a bit lost with the CI failures here. Looks like it's timing out for some of our docker acceptance tests? See here. But I tried running them on my gceworker:

$ make acceptance TESTS=TestDockerRuby PKG=./pkg/acceptance TESTFLAGS='-v -show-logs'        
...
--- PASS: TestDockerRuby (10.07s)
    --- PASS: TestDockerRuby/runMode=docker (5.46s)
    --- PASS: TestDockerRuby/runMode=docker#01 (4.61s)
PASS

And they work? I'm a bit lost. @knz, @tbg: any ideas what I'm missing? I took a cursory glance at our acceptance testing harness setup, and nothing stands out to me. And the logs aren't super helpful. I've kicked off another run, but I'm pretty sure it'll fail as well (it has before).

irfansharif · 2020-11-09T18:42:35Z

bors r+

craig · 2020-11-09T19:46:24Z

Build succeeded:

GitHub CI (Cockroach)

irfansharif · 2020-11-09T19:49:11Z

🤷‍♀️

knz · 2020-11-10T13:25:39Z

very nice piece of work 💯 🎉

irfansharif requested review from knz and tbg November 3, 2020 21:01

irfansharif requested a review from a team as a code owner November 3, 2020 21:01

irfansharif force-pushed the 201103.rm-gossip-start branch 3 times, most recently from 26cc29e to f05cc1f Compare November 3, 2020 22:37

irfansharif mentioned this pull request Nov 3, 2020

server: double allocation of store IDs in multi-store setups #56272

Closed

irfansharif commented Nov 3, 2020

View reviewed changes

tbg reviewed Nov 4, 2020

View reviewed changes

tbg self-requested a review November 4, 2020 14:42

tbg reviewed Nov 4, 2020

View reviewed changes

irfansharif mentioned this pull request Nov 4, 2020

[dnm,wip] server: reproduce double-allocation of store IDs #56271

Closed

irfansharif commented Nov 4, 2020

View reviewed changes

irfansharif mentioned this pull request Nov 4, 2020

server: initialize first store within the init server #56299

Merged

irfansharif force-pushed the 201103.rm-gossip-start branch 2 times, most recently from b533630 to 1d97fb3 Compare November 6, 2020 05:06

irfansharif commented Nov 6, 2020

View reviewed changes

tbg approved these changes Nov 6, 2020

View reviewed changes

irfansharif force-pushed the 201103.rm-gossip-start branch from 1d97fb3 to 967361f Compare November 6, 2020 13:48

irfansharif force-pushed the 201103.rm-gossip-start branch from 967361f to dab8955 Compare November 9, 2020 16:41

craig bot merged commit 791cae9 into cockroachdb:master Nov 9, 2020

irfansharif deleted the 201103.rm-gossip-start branch November 9, 2020 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: rm gossip usage from node/store/cluster ID allocation #56263

server: rm gossip usage from node/store/cluster ID allocation #56263

irfansharif commented Nov 3, 2020

cockroach-teamcity commented Nov 3, 2020

irfansharif left a comment

tbg left a comment

tbg left a comment

irfansharif left a comment

irfansharif left a comment

tbg left a comment

irfansharif commented Nov 6, 2020

craig bot commented Nov 6, 2020

irfansharif commented Nov 9, 2020

irfansharif commented Nov 9, 2020

craig bot commented Nov 9, 2020

irfansharif commented Nov 9, 2020

knz commented Nov 10, 2020

server: rm gossip usage from node/store/cluster ID allocation #56263

server: rm gossip usage from node/store/cluster ID allocation #56263

Conversation

irfansharif commented Nov 3, 2020

cockroach-teamcity commented Nov 3, 2020

irfansharif left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

irfansharif commented Nov 6, 2020

craig bot commented Nov 6, 2020

irfansharif commented Nov 9, 2020

irfansharif commented Nov 9, 2020

craig bot commented Nov 9, 2020

irfansharif commented Nov 9, 2020

knz commented Nov 10, 2020